NetOps 2.0 Transformation by Ray Belleville

NetOps 2.0 Transformation by Ray Belleville

Author:Ray Belleville [Belleville, Ray]
Language: eng
Format: epub
Publisher: eBookIt.com
Published: 0101-01-01T00:00:00+00:00


Chapter Five

Isolation

A

s I’ve mentioned before, there will be a lot of inter-twined concepts across these pillars. When I initially created this book, I had visions of noticeable lines between documentation, isolation, repair, and escalation. In reality, documentation is at the heart of all of these concepts. It’s discussed here first for that reason. We’ve already discussed documenting the problem, but what about isolating a problem in the network.

This isolation is more about removing the problem from the network versus finding the cause of the problem. Accurate and relevant documentation is needed to ensure that if a network interface is experiencing many errors that impact performance, can it be shut down while repaired? Is there network redundancy in place, capable of taking over?

The same concept applies in a scheduled maintenance window. Can a specific change be applied to the network, and while the change is occurring, traffic is re-routed to ensure the users do not experience an outage? More on that later.

The tools for documenting network intents can be invaluable here. They allow the applications to be analyzed from a redundancy perspective. They also provide immediate confidence that specific devices or interfaces can be taken out of service to assist with troubleshooting and mitigating risk,

We often see that the only resource available to a network’s operation team is all-purpose documentation. We discussed how a static document is not likely to have the right amount of detail to ensure the NetOps Engineer can perform this isolation.

I may know that the physical connectivity appears redundant. But is the routing architecture deployed so that it will converge without creating a more extensive outage? How current is the documentation set being used?

What if I believe there is proper redundancy, but the document is six months old. Through the network's regular operations, something was moved, added, or changed that somehow impacted this redundancy?

Again, looking back to the statistic that 45% of network outages were avoidable and operator error was the cause, access to information in the form of relevant documentation can reduce these errors. There is nothing worse than having a small issue made into a disastrous problem simply through lack of visibility.

Still, part of documentation is the need for a visual aid during an isolation process for the simple act of removing a problem area from the network temporarily or merely knowing which elements in the network are part of the application path.

Do you know?

Which type of devices are along the path?

Where is redundancy deployed?

Which ACLs have been applied along that path?

Interfaces and addressing

Is there an MPLS provider in the path?

Which QoS policies are applied and the per-hop behavior?

How resilient are the devices?

Have there been any changes on the devices recently?



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.